Exploiting Hybrid-Channel Information For Downlink Multi-User Scheduling

ABSTRACT

A method for determining an optimal network utilization maximization in a communication system with wireless links in which current and coarse channel state information CSI is available from all users, along with a limited amount of fine CSI by way of a frame based entails a scheduling and feedback under which a virtual queue is associated with each user with virtual rates being determined at start of each frame and policy of each frame being determined by solving a decision process.

RELATED APPLICATION INFORMATION

This application claims priority to provisional application No. 61/754,364, entitled “Exploiting Hybrid Channel Information for MU-MIMO Scheduling”, filed Jan. 18, 2013, the contents thereof are incorporated herein by reference.

BACKGROUND OF THE INVENTION

The present invention relates generally to wireless communications, and more particularly, to exploiting hybrid channel information for downlink multi-user scheduling.

Consider a cellular downlink with multiple users (receivers) and a base-station transmitter, where multiple antennas are present at the transmitter. Over such a downlink, linear transmit precoding is known to be advantageous in enabling simultaneous transmissions to multiple users over the same time-frequency resource (a.k.a. multi-user multiple-input multiple output MU-MIMO).

However, the conventional approach for MU-MIMO requires accurate and timely (i.e., delay free) channel state information (CSI) from all the users in order to construct such precoders. Unfortunately, obtaining such accurate and timely CSI is near impossible in practical cellular systems and in the absence of such CSI, the conventional MU-MIMO breaks down and does not offer any gains.

The design of linear precoding schemes and user scheduling algorithms has been considered for multiple user multiple-input multiple-output MU-MIMO situations under the assumption that accurate and timely (i.e., delay free) estimates of CSI from all users can be obtained. These estimates are determined based on the CSI feedback from the users. Prediction based approaches to obtain such estimates have been proposed for the case where the delay in CSI feedback is small enough and a model for the channel evolution is available. However, at-least one of the latter two assumptions does not typically hold

A new approach suggested for MU-MIMO, referred to as the MAT scheme, has demonstrated that accurate but arbitrarily delayed CSI can also be exploited to achieve considerable MU-MIMO gains. However, the MAT scheme was proposed for a rather simple setup in which network utility maximization (NUM) via optimized scheduling was not incorporated, and thus it cannot be used over practical systems where the number of users is usually much larger than the number of streams that can be simultaneously scheduled over any time-frequency resource, which makes user scheduling necessary.

Accordingly, there is a need for exploiting hybrid channel information for downlink multi-user scheduling that overcomes the above mentioned limitations.

BRIEF SUMMARY OF THE INVENTION

The present invention is directed to method, implemented by a computer, that includes determining an optimal network utilization maximization in a communication system with wireless links in which current and coarse channel state information CSI is available from all users, along with a limited amount of fine CSI by way of a frame based scheduling and feedback under which a virtual queue is associated with each user with virtual rates being determined at start of each frame and policy of each frame being determined by solving a decision process, the determining includes initializing an index indicative of an interval, a system state and control parameters, and if the interval is not the start of a frame then determining action for current interval using system state for current interval and the state action frequencies computed for the current frame, performing action for current interval, updating virtual queues and incrementing the interval index; and updating system state for the current interval.

In an alternative expression of the invention, a system includes a communication system with wireless links in which current and coarse channel state information CSI is available from all users, along with a limited amount of fine CSI by way of a frame based scheduling and feedback under which a virtual queue is associated with each user with virtual rates being determined at start of each frame and policy of each frame being determined by solving a decision process for determining an optimal network utilization maximization, the system includes computer processing for carrying out the following: initializing an index indicative of an interval, a system state and control parameters; and if the interval is not the start of a frame then: determining action for current interval using system state for current interval and the state action frequencies computed for the current frame, performing action for current interval, updating virtual queues and incrementing the interval index, and updating system state for the current interval.

These and other advantages of the invention will be apparent to those of ordinary skill in the art by reference to the following detailed description and the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is flow diagram for joint scheduling and feedback to maximize system utility over all throughput vectors that are achievable, in accordance with the invention;

FIG. 2 is a diagram depicting interference resolving for pending packets, discussed in conjunction with FIG. 1;

FIG. 3 shows an exemplary computer to perform the invention;

FIG. 4 is a graph comparing the sum utility rate obtained using conventional MU-MIMO that only uses the current CSI with that obtained using myopic scheduling that uses only delayed CSI (EMAT with delayed) and the myopic scheduling that uses the hybrid CSA (EMAT with hybrid), where for the latter two schemes the average rates are computed assuming both the sub-optimal and optimal filtering; and

FIG. 5 is a graph comparing the sum rate utility obtained using the myopic scheduling that uses hybrid CSI along with optimal filtering, for different codebook sizes.

DETAILED DESCRIPTION

The present invention is directed to a generalized version of the MAT scheme in which current and coarse CSI is available from all the users along with a limited amount of fine albeit delayed CSI. It then formulates a network utility maximization (NUM) framework for this generalized MAT scheme. In this framework, the fine CSI can only be obtained from each user pair that is scheduled over a time-frequency resource, after any specified delay.

The invention's framework can be supported by small changes to the 4G cellular standards specification in which coarse CSI can be reported by all users in a timely fashion using the sparse CSI reference symbols and the limited capacity feedback channel, whereas finer CSI can be determined only by the scheduled users using the dense demodulation reference symbols. A second round of feedback can be added to report such fine CSI incurring very small extra overhead. In addition, a round of feed-forward signaling (also with only a small overhead) is needed to convey certain cross channel information to the users. To solve the NUM problem the invention proposes a novel frame-based scheduling and feedback approach under which a virtual queue is associated with each user. Virtual arrival rates are determined at the start of each frame and the policy for each frame is determined by solving a Markov decision process. This approach is rigorously shown to be optimal and can be tailored to optimize a large class of utility functions.

This invention proposes a joint scheduling and feedback scheme to solve the following problem: Maximize system utility over all throughput vectors that are achievable.

The joint scheduling solution is depicted in a flow diagram shown in FIG. 1. In conjunction with describing the steps in the flowchart, the equation numbers and section numbers correspond to same numbered equations herein after.

Step 100: Initialize the interval index to 0 and input the system state for interval 0 along with control parameters V, r_(max). The system state at any interval consists of the coarse channel estimates reported by all the N users for that interval. In addition, the system state includes for each possible pair of users, the fine and coarse channel estimates from both users in that pair for the recent-most prior interval on which that user pair was scheduled and the index of that interval.

Step 102: A check is undertaken if the interval k is the start of a frame (i.e., k=τT, τ=0, 1, . . . ) where T is the number of intervals in each frame.

Step 104: If the interval k is not the start of a frame, then there is a use of the system state for current interval, S[k], and the state-action frequencies computed for the current frame to determine action for current interval.

Using the Bayesian rule, we can identify the corresponding stationary policy Ψ*_(Q[τT]), which at any interval k in the τ^(th) frame first maps the state S[k] to its counterpart s ε S. Then, if Σ _(a′)x*(s, a′)>0, it chooses action a using the probabilistic rule

${{P\left( {{pick}\mspace{14mu} \underset{\_}{a}\mspace{14mu} {at}\mspace{14mu} {state}\mspace{14mu} \underset{\_}{s}} \right)} = \frac{x^{*}\left( {\underset{\_}{s},\underset{\_}{a}} \right)}{\sum_{{\underset{\_}{a}}^{\prime}}{x^{*}\left( {\underset{\_}{s},{\underset{\_}{a}}^{\prime}} \right)}}},{\forall{\underset{\_}{a} \in {\underset{\_}{}.}}}$

On the other hand, if Σ _(a′)x*(s, a′)=0, it chooses action a arbitrarily. Let R^(frame)[k], τT≦k≦(τ+1)T−1, denote the service rate vectors obtained under this policy for the intervals in the τ^(th) frame.

We list the following results which can be obtained using those that have been derived before for weakly communicating Markov Decision Processes [10][11].

See the Additional Information of the application for discussion of the section B. State-action frequency approach, including this procedure discussed in the paragraph after Equation (18).

Step 106: The action determined in Step 104 is performed. The action is performed over the three orthogonal slots in an interval. In the first slot new packets are transmitted to the selected user pair whereas in slots 2 and 3 interference resolution is performed for the selected pair by sending interference resolving packets for a previous (most-recent) pending transmission involving the selected pair. This is depicted by the diagram of FIG. 2 and in conjunction with the following equations (2) to (7).

y _(u) ₁ [k,1]=h _(u) ₁ [k](x _(u) ₁ [k]+x _(u) ₂ [k])+n _(u) ₁ [k,1],  (2)

y _(u) ₂ [k,1]=h _(u) ₂ [k](x _(u) ₁ [k]+x _(u) ₂ [k])+n _(u) ₂ [k,1].  (3)

y _(u) ₁ [k,2]=h _(u) ₁ [k]z[k,2]({hacek over (h)} _(u) ₁ [κ]x _(u) ₂ [κ])+n _(u) ₁ [k,2],  (4)

y _(u) ₂ [k,2]=h _(u) ₂ [k]z[k,2]({hacek over (h)} _(u) ₁ [κ]x _(u) ₂ [κ])+n _(u) ₂ [k,2].  (5)

y _(u) ₁ [k,3]=h _(u) ₁ [k]z[k,3]({hacek over (h)} _(u) ₂ [κ]x _(u) ₁ [κ])+n _(u) ₁ [k,3],  (6)

y _(u) ₂ [k,3]=h _(u) ₂ [k]z[k,3]({hacek over (h)} _(u) ₂ [κ]x _(u) ₁ [κ])+n _(u) ₂ [k,3].  (7)

Equations (2) to (7) are discussed in greater detail in the Additional Information section of the application, II—C. Expected Transmission Rates (Rewards).

Step 108: The virtual queues are updated using Equation (15), and the interval index is incremented, (k−>k+1). Note that the service rates obtained for the scheduled user pair can (optionally) be determined based on ACK/NACK feedback from the users.

Q _(n) [k+1]=Q _(n) [k]−R _(n) ^(Ψ*) ^(Q[τT]) [k])⁺ +r* _(n)[τ],  (15)

Discussion concerning Equation (15) is provided in greater detail in the Additional Information section, III—A. Virtual Queue and Virtual Arrival Process, of the application.

Step 110: The system state for the new current interval k is determined by updating the current coarse CSI in the previous system state to be the coarse CSI reported by all the users for the current interval. In addition, the tuple corresponding to the user pair scheduled in the previous interval is updated using the fine and coarse CSI received from both the users in that pair for the previous interval. Also, the respective fine cross channel estimates are fed-forward to both the users in that pair. The details are described in Section II-B, in the Additional Information section. The process loops back to step 102.

If in Step 102 the interval k is indeed the start of a frame the process moves to Step 112.

Step 112: The virtual queues are sampled. Details of the sampling are discussed in the paragraph above Equation (14) of the Additional Information section of the application, III—A. Virtual Queue and Virtual Arrival Process.

Step 114: Determine virtual arrival rates using sampled virtual queue values in Equation (14):

$\begin{matrix} {{{\max\limits_{r:{0r_{\max}1}}{V \cdot {U(r)}}} - {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}r_{n}}}},} & (14) \end{matrix}$

Step 116: Use sampled virtual queue values to determine state-action frequencies by solving a linear program in equation (17), discussed in greater detail in the Additional Information section of the application, III—B. State-action frequency approach:

$\begin{matrix} {{\max\limits_{x}{\sum\limits_{\underset{\_}{s},\underset{\_}{a}}{q^{T}{R\left( {\underset{\_}{s},\underset{\_}{a}} \right)}{x\left( {\underset{\_}{s},\underset{\_}{a}} \right)}}}}{{s.t.\mspace{14mu} x} \in {\underset{\_}{}.}}} & (17) \end{matrix}$

The process proceeds to Step 104.

The invention may be implemented in hardware, firmware or software, or a combination of the three. Preferably the invention is implemented in a computer program executed on a programmable computer having a processor, a data storage system, volatile and non-volatile memory and/or storage elements, at least one input device and at least one output device. More details are discussed in U.S. patent Ser. No. 8/380,557, the content of which is incorporated by reference.

By way of example, a block diagram of a computer to support the system is discussed in conjunction with FIG. 3. The computer preferably includes a processor, random access memory (RAM), a program memory (preferably a writable read-only memory (ROM) such as a flash ROM) and an input/output (I/O) controller coupled by a CPU bus. The computer may optionally include a hard drive controller which is coupled to a hard disk and CPU bus. Hard disk may be used for storing application programs, such as the present invention, and data. Alternatively, application programs may be stored in RAM or ROM. I/O controller is coupled by means of an I/O bus to an I/O interface. I/O interface receives and transmits data in analog or digital form over communication links such as a serial link, local area network, wireless link, and parallel link. Optionally, a display, a keyboard and a pointing device (mouse) may also be connected to I/O bus. Alternatively, separate connections (separate buses) may be used for I/O interface, display, keyboard and pointing device. Programmable processing system may be preprogrammed or it may be programmed (and reprogrammed) by downloading a program from another source (e.g., a floppy disk, CD-ROM, or another computer).

Each computer program is tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein. The inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.

I. INTRODUCTION

Multiple Input Multiple Output (MIMO) technology is essential for the emerging 4G-LTE wireless communication systems. In the downlink of such a system, which typically has several active users, multiple antennas enable simultaneous transmissions to multiple users by allowing the transmitter (base-station) to transmit (along directions in a signal space) in a manner which ensures that each user can receive its intended signal along at-least one interference-free dimension (a.k.a. the Multi-user MIMO principle) [1]. The number of active users is generally greater than the maximum supportable number of simultaneous transmissions, which in turn is equal to the number of transmit antennas at the base-station (BS). Consequently, only a subset of users can be selected for the MU-MIMO transmission and hence proper user scheduling is important to achieve a desired network utility (e.g., throughput, fairness).

The usual assumption made in existing literature on MU-MIMO scheduling is that the BS can obtain the channel state information from all users with sufficient accuracy and with negligible delay. Such information, referred to as the Channel State Information at the Transmitter (CSIT), is crucial to ensure that each scheduled user is not dominated by co-channel interference. Typically, the BS obtains CSIT by broadcasting a sequence of pilot symbols, and the users in turn estimate their CSI and feedback their quantized estimates to the BS. This feedback process introduces two sources of imperfections to the CSIT. (1) Estimation and quantization errors (due to limited training and finite codebooks); (2) Delays (due to user processing speeds and less flexible scheduling on the feedback channel). The impact of erroneous CSIT on MU-MIMO performance has been analyzed in [2] and utility maximization for MU-MIMO with erroneous CSIT has been considered in [4]. Delay in the CSIT has hitherto been addressed by using prediction based approaches but their drawback is that they have to assume a model for channel evolution, which is significantly difficult to obtain in practice and they also require the delay to be small enough to allow for useful prediction.

For the scenario where the number of users is small enough so that user scheduling is unnecessary, referred to here as the static scenario, Maddah-Ali and Tse proposed a scheme, namely the MAT scheme [5], that utilizes CSIT that is error-free albeit completely outdated. Their seminal work revealed that the outdated CSI is an important resource that, when combined with the eavesdropped information at the users, can provide a considerable performance gain in terms of degrees of freedom. Recently, the MAT scheme was extended (for the static scenario) to the hybrid CSIT case by also incorporating coarse and current CSIT [6] to obtain further system gains. However, in the ubiquitous setting where user scheduling is important, such hybrid CSIT needs to be exploited wisely since it is costly to obtain even delayed but error-free CSI feedback from all users for making the scheduling decisions. Indeed, the problem is quite different and more challenging than the static case. User scheduling for the MAT scheme has been considered in [3] but their suggested method is akin to the myopic approach discussed later in this paper.

In this paper, we study MU-MIMO downlink scheduling with hybrid CSIT, erroneous as well as delayed, where the time axis is divided into separate scheduling intervals. We consider the realistic scenario where current and coarse CSIT is obtained from all users while more accurate (not necessarily perfect) but delayed CSIT is obtained only from the scheduled users. The scheduling problem is hence characterized by an intricate ‘exploitation—exploration tradeoff’, between scheduling the users based on current CSIT for immediate gains, and scheduling them to obtain finer albeit delayed CSIT and potentially larger future gains. The contributions of the paper are listed as follows.

We tackle the aforementioned ‘exploitation—exploration tradeoff’ by formulating a frame based joint scheduling and feedback approach, where in each frame a policy is obtained as the solution to a Markov Decision Process (MDP), the latter solution being determined via a state-action frequency approach [10][11].

We consider a general utility function and associate a virtual queue with each user that guides the achieved utility for that user. Based on MDP solutions and virtual queue evolutions, we show that our proposed frame-based joint scheduling and feedback approach can be made arbitrarily close to the optimal.

In the following we use (.)^(T), (.)^(†) for the transpose and conjugate transpose, respectively. Moreover, [A, B] and [A; B] are used to denote column-wise and row-wise concatenation of matrices A and B, respectively. ∥A∥ is used to denote the Frobenius norm of the matrix A.

II. SYSTEM MODEL AND PROBLEM FORMULATION

We consider the downlink MU-MIMO scheduling problem with one Base Station (BS) and N users. The BS is equipped with M_(t) transmit antennas and employs linear transmit precoding. Each user is equipped with a single receive antenna. Time is divided into intervals and we let h_(i)[k]ε^(1×M) ^(t) , i=1, . . . , N denote the channel state vector seen by user i in interval k. In each interval, a subset of users can be simultaneously scheduled. Further, since each user has only one receive antenna, it can achieve at-most one degree of freedom (i.e., its average data rate per channel use can scale with SNR as log(SNR)). On the other hand, the system can achieve at-most M_(t) degrees of freedom in that the total average system rate can scale with SNR as M_(t) log(SNR). For notational convenience we assume that in each interval two users can be simultaneously served, hence limiting the achievable system degrees of freedom to 2. All results can however be extended to the general case without this restriction.

A. Conventional MU-MIMO Scheme

Conventional MU-MIMO scheme relies on estimates of the user channel states (that are available at the BS) for the current interval. Indeed, perfect CSIT for the current interval enables the BS to transmit simultaneously to both scheduled users without causing interference at either of them. However, in the absence of perfect CSIT such complete interference suppression via transmitter side processing is no longer possible and when only very coarse estimates for the current interval are available, conventional MU-MIMO breaks down and in-fact becomes inferior to simple single-user per interval transmission.

B. Joint Scheduling and Channel Feedback

We consider a joint scheduling and channel feedback scheme that builds upon a variant of the extended MAT technique [6]. The extended MAT scheme is recapitulated in Appendix-A. Specifically, we assume that coarse quantized channel state estimates from all users for the current interval are available to the BS, along with limited finer albeit outdated quantized channel state estimates. In this context we note that in the FDD downlink only quantized estimates are available to the BS and henceforth unless otherwise mentioned, we will use “estimates” to mean “quantized estimates”. The time duration of interest is divided into intervals with each interval comprising of 3 slots each. The three slots are mutually orthogonal time-bandwidth slices. For convenience, we assume that all three slots in an interval are within the coherence time and coherence bandwidth window so that the channel seen by each user remains constant over the three slots in an interval. At the beginning of the k^(th) interval, whose corresponding slots are denoted by [k, 1], [k, 2] and [k, 3], the scheduler broadcasts a short sequence of pilot symbols to all the users. This sequence enables a coarse estimation of the wireless channel at each of the N users, which is fed back to the BS after quantization and is denoted by Ĥ[k]={ĥ_(i)[k], i=1, . . . N}, where ĥ_(i)[k] denotes the coarse channel estimate obtained from user i for interval k. Based on these coarse estimates, along with its past scheduling and channel state history (formally introduced next), the scheduler chooses a pair of users to schedule in the current interval, where in the first slot a linear combination of new packets is sent for the selected user pair. Data transmission to the selected user pair in the current interval also contains additional pilots that enable a finer estimation of the channel states seen by that user pair over the current interval. Note that such finer estimation is crucial for data detection. However, due to user processing and feedback delays, we assume that (quantized versions of) such finer estimates are not available to the BS during the current interval itself. Because of this constraint, instead of performing the transmissions in slots 2 and 3 for interference resolution for the packets sent in Slot 1 of the current interval, as would be done in the extended MAT scheme [6], the BS performs transmissions for interference resolution for packets sent in Slot 1 of the prior most recent interval when the selected user pair was scheduled. The scheduling model is illustrated in FIG. 2.

As mentioned above the scheduler obtains a finer estimate of the channel states seen by a user pair on the interval in which they are scheduled, at the end of that interval.¹ Let θ=(u₁, u₂, κ) represent the 3-tuple denoting the scheduling decision made for the current interval k such that u₁, u₂ denote the selected user pair and κ denotes the index of the prior most recent interval over which that pair was scheduled. We let Γ[k] be the collection of the most recently obtained coarse and finer channel estimates at the BS for each of the user pairs and their corresponding interval indices, at the start of interval k. Thus, the set Γ[k] takes the form Γ[k]={(ĥ_(i)[κ_(i,j)], ĥ_(j)[κ_(i,j)], {hacek over (h)}_(i)[κ_(i,j)], {hacek over (h)}_(j)[κ_(i,j)], κ_(i,j)), 1≦i<j≦N}, where ({hacek over (h)}_(i)[κ_(i,j)], {hacek over (h)}_(j)[κ_(i,j)]) denote the finer estimates for interval κ_(i,j) and κ_(i,j) denotes the index of the prior recent-most interval on which pair i, j was scheduled. At the end of that interval (equivalently at the start of interval k+1) the set Γ[k+1] is obtained by first setting it equal to Γ[k] and then updating the tuple corresponding to the pair (u₁, u₂) selected in interval k to (ĥ_(u) ₁ [k], ĥ_(u) ₂ [k], {hacek over (h)}_(u) ₁ [k], {hacek over (h)}_(u) ₂ [k], k). Arbitrary delays in obtaining such finer estimates are also considered later in the paper.

The set of user channel states are assumed to be i.i.d. across intervals and the channel states of any two distinct users are assumed to be mutually independent. Given a particular initial rough estimates of the channel states of the user pair selected in interval k, (ĥ_(u) ₁ [k], ĥ_(u) ₂ [k]), the distribution of the finer channel estimates in the same interval is described by the conditional distribution

P({hacek over (h)} _(u) ₁ [k],{hacek over (h)} _(u) ₂ [k]|ĥ _(u) ₁ [k],ĥ _(u) ₂ [k])  (1)

where the conditional probability depends on the types of channel estimators, quantization, training times and powers, etc. We let C_(coarse) (C_(fine)) denote the finite sets or codebooks of vectors from which all coarse (fine) estimates are selected. Let |C_(coarse)| and |C_(fine)| denote their respective cardinalities and clearly |C_(fine)|≧|C_(coarse)|.

C. Expected Transmission Rates (Rewards)

During the current interval k, formed by slots [k, 1], [k, 2]& [k, 3], once a pair of users is selected, the scheduler specifies transmit precoding matrices or vectors for each slot in the interval.

1) Slot 1: For slot 1, the overall transmit precoding matrix is denoted by the matrix [W_(u) ₁ [k], W_(u) ₂ [k]], where W_(u) ₁ [k], W_(u) ₂ [k]ε^(M) ^(t) ^(×2). Let x_(u) ₁ [k]=W_(u) ₁ [k]s_(u) ₁ [k], x_(u) ₂ [k]=W_(u) ₂ [k]s_(u) ₂ [k], where s_(u) ₂ [k], s_(u) ₂ [k] denote the 2×1 symbol vectors containing symbols formed using the new packets intended for user u₁ and u₂, respectively, and where E[s_(u) _(i) [k]s_(u) _(i) ^(†)[k]]=I, iε{1, 2}. Then, the signal transmitted in slot-1 is x_(u) ₁ [k]+x_(u) ₂ [k] so that the received signals at both users are

y _(u) ₁ [k,1]=h _(u) ₁ [k](x _(u) ₁ [k]+x _(u) ₂ [k])+n _(u) ₁ [k,1],  (2)

y _(u) ₂ [k,1]=h _(u) ₂ [k](x _(u) ₁ [k]+x _(u) ₂ [k])+n _(u) ₂ [k,1].  (3)

Note that the allocated transmission power for scheduled user u_(i) is the norm ∥W_(u) _(i) [k]∥². We assume that the maximum average (per-slot) transmission power budget at the BS is P. Thus, the corresponding power constraint is ∥W_(u) ₁ [k]∥²+∥W_(u) ₂ [k]∥²≦P. Notice that the precoding matrix [W_(u) ₁ [k], W_(u) ₂ [k]] seeks to facilitate the transmission of new packets to users u₁ and u₂ and thus must be designed based on the available coarse estimates (ĥ_(u) ₁ [k], ĥ_(u) ₂ [k]), since the corresponding finer estimates for that interval are not yet available to the scheduler. Accordingly, we assume that this precoding matrix can be obtained as the output of any arbitrary but fixed (time-invariant) mapping from C_(coarse)×C_(coarse) to ^(M) ^(t) ^(×4), when the coarse estimates (ĥ_(u) ₁ [k], ĥ_(u) ₂ [k]) are given as an input. Note that assuming the mapping to be fixed is well suited to systems where the so-called “precoded pilots” are not available so that the choice of precoders needs to be signalled to the scheduled users. A fixed mapping (which is equivalent to one codebook of transmit precoders) then allows for efficient signaling.

2) Slot 2: In slot 2 of the interval, an interference resolving packet for a pending previous transmission involving users (u₁, u₂), sent in interval κ<k, is transmitted. In particular, the transmitted signal vector over the M_(t) antennas is

${{z\left\lbrack {k,2} \right\rbrack}\left( {{{\overset{\Cup}{h}}_{u_{1}}\lbrack\kappa\rbrack}\underset{x_{u_{2}{\lbrack\kappa\rbrack}}}{\underset{}{{W_{u_{2}}\lbrack\kappa\rbrack}{s_{u_{2}}\lbrack\kappa\rbrack}}}} \right)},$

where z[k, 2]ε^(M) ^(t) ^(×1) is a precoding vector. Note that {hacek over (h)}_(u) ₁ [κ]x_(u) ₂ [κ] is a scalar, so the average power constraint E[∥z[k,2]{hacek over (h)}_(u) ₁ [κ]x_(u) ₂ [κ]∥²]≦P can also be written as ∥z[k,2]∥²∥{hacek over (h)}_(u) ₁ [κ]W_(u) ₂ [κ]∥²≦P. The received signals in slot 2 at both users are therefore

y _(u) ₁ [k,2]=h _(u) ₁ [k]z[k,2]({hacek over (h)} _(u) ₁ [κ]x _(u) ₂ [κ])+n _(u) ₁ [k,2],  (4)

y _(u) ₂ [k,2]=h _(u) ₂ [k]z[k,2]({hacek over (h)} _(u) ₁ [κ]x _(u) ₂ [κ])+n _(u) ₂ [k,2].  (5)

3) Slot 3: In slot 3 of the interval, similarly, the transmitted signal is

${z\left\lbrack {k,3} \right\rbrack}{\left( {{{\overset{\Cup}{h}}_{u_{2}}\lbrack\kappa\rbrack}\underset{x_{u_{1}{\lbrack\kappa\rbrack}}}{\underset{}{{W_{u_{1}}\lbrack\kappa\rbrack}{s_{u_{1}}\lbrack\kappa\rbrack}}}} \right).}$

so that the power constraint is ∥z[k,3]∥²∥{hacek over (h)}_(u) ₂ [κ]W_(u) ₁ [κ]∥²≦P. The received signals in slot 3 at both users are therefore

y _(u) ₁ [k,3]=h _(u) ₁ [k]z[k,3]({hacek over (h)} _(u) ₂ [κ]x _(u) ₁ [κ])+n _(u) ₁ [k,3],  (6)

y _(u) ₂ [k,3]=h _(u) ₂ [k]z[k,3]({hacek over (h)} _(u) ₂ [κ]x _(u) ₁ [κ])+n _(u) ₂ [k,3].  (7)

Notice that the precoding vectors z[k, 2], z[k, 3] seek to facilitate the completion of a pending transmission to users u₁ and u₂ and thus must be designed based on the available coarse estimates (ĥ_(u) ₁ [k], ĥ_(u) ₂ [k]), as well as the available estimates for interval κ which are ({hacek over (h)}_(u) ₁ [κ], {hacek over (h)}_(u) ₂ [κ]) and (ĥ_(u) ₁ [κ], ĥ_(u) ₂ [κ]). Accordingly, we assume that these two vectors can be obtained as the output of an arbitrary but fixed mapping from C_(fine) ²×C_(coarse) ⁴ to ^(M) ^(t) ^(×2). An example of mapping rules to obtain the precoding matrices and vectors is given later in the section on simulation results.

Next, in order to compute the average rates (rewards) we assume that the channel state vectors h_(u) _(i) [κ], h_(u) _(i) [k] are known perfectly to user u_(i), iε{1, 2}(each user of course also knows the quantized estimates it has fed back to the base-station). In addition, user u₁ (u₂) is also conveyed the finer estimate {hacek over (h)}_(u) ₂ [κ], ({hacek over (h)}_(u) ₁ [κ]) via feed-forward signaling before the start of interval k. For simplicity, the feedback and feedforward signaling overheads are ignored in this work. Then, by the end of slot 3, from (2), (4) and (6), at user u₁, we have

$\begin{matrix} {{{{y_{u_{1}}\left\lbrack {\kappa,1} \right\rbrack} - \frac{y_{u_{1}}\left\lbrack {k,2} \right\rbrack}{{h_{u_{1}}\lbrack k\rbrack}{z\left\lbrack {k,2} \right\rbrack}}} = {{{h_{u_{1}}\lbrack\kappa\rbrack}{x_{u_{1}}\lbrack\kappa\rbrack}} + {\left( {{h_{u_{1}}\lbrack\kappa\rbrack} - {{\overset{\Cup}{h}}_{u_{1}}\lbrack\kappa\rbrack}} \right){x_{u_{2}}\lbrack\kappa\rbrack}} + {n_{u_{1}}\left\lbrack {\kappa,1} \right\rbrack} - \frac{n_{u_{1}}\left\lbrack {k,2} \right\rbrack}{{h_{u_{1}}\lbrack k\rbrack}{z\left\lbrack {k,2} \right\rbrack}}}},\mspace{20mu} {{y_{u_{1}}\left\lbrack {k,3} \right\rbrack} = {{\underset{\underset{\delta_{u_{1}}{\lbrack k\rbrack}}{}}{\left( {{h_{u_{1}}\lbrack k\rbrack}{z\left\lbrack {k,3} \right\rbrack}} \right)}{{\overset{\Cup}{h}}_{u_{2}}\lbrack\kappa\rbrack}{x_{u_{1}}\lbrack\kappa\rbrack}} + {n_{u_{1}}\left\lbrack {k,3} \right\rbrack}}},} & (8) \end{matrix}$

where the additive noise variables n_(u) ₁ [k, 1], n_(u) ₁ [k, 2], n_(u) ₁ [k, 3] are i.i.d. circularly symmetric complex Gaussian variables with zero-mean and unit variance, CN(0, 1). Notice that the interference term (h_(u) ₁ [κ]−{hacek over (h)}_(u) ₁ [κ], x_(u) ₂ [κ] is independent of the desired signal as well as the additive noise. Letting h_(u) ₁ ^(error)[κ]=h_(u) ₁ [κ]−{hacek over (h)}_(u) ₁ [κ], the noise plus interference covariance for user u₁, denoted by Γ_(u) ₁ [k], is therefore

$\begin{bmatrix} {1 + {{{h_{u_{1}}^{error}\lbrack\kappa\rbrack}{W_{u_{2}}\lbrack\kappa\rbrack}}}^{2} + \frac{1}{{{{h_{u_{1}}\lbrack\kappa\rbrack}{z\left\lbrack {\kappa,2} \right\rbrack}}}^{2}}} & 0 \\ 0 & 1 \end{bmatrix}.$

Define G_(u) ₁ [k]=[h_(u) ₁ [κ]W_(u) ₁ [κ]; δ_(u) ₁ [k]{hacek over (h)}_(u) ₂ [κ]W_(u) ₁ [κ]] and note that G_(u) ₁ [k]ε^(2×2). Further, let H^(csi)((u₁, u₂), (κ, k))={{hacek over (h)}_(u) ₁ [κ], {hacek over (h)}_(u) ₂ [κ], ĥ_(u) ₁ [κ], ĥ_(u) ₂ [κ], ĥ_(u) ₁ [k], ĥ_(u) ₂ [k]} denote the set of channel state information at the scheduler for user pair u₁, u₂ over intervals κ, k. Then, using (8) the instantaneous information rate, denoted as I_(u) ₁ [k] is given by

$\begin{matrix} {{{I_{u_{1}}\lbrack k\rbrack} = {\frac{1}{3}\log {{I + {{\Gamma_{u_{1}}^{- 1}\lbrack k\rbrack}{G_{u_{1}}\lbrack k\rbrack}{G_{u_{1}}^{\dagger}\lbrack k\rbrack}}}}}},} & (9) \end{matrix}$

where the fraction ⅓ is to account for the fact that three slots are needed to obtain this rate. Then, (an optimistic value for) the average information rate that can be achieved via rateless coding (cf. [9]) is given by

R _(u) ₁ ^(opt) [k]=E[I _(u) ₁ [k]|H ^(csi)((u ₁ ,u ₂),(κ,k))].  (10)

A more conservative rate that is appropriate for conventional coding, denoted as R_(u) ₁ ^(conv)[k], is given by

r _(θ,u) ₁ (1−Pr(I _(u) ₁ [k]<r _(θ,u) ₁ |H ^(csi)((u ₁ ,u ₂),(κ,k)))),  (11)

where r _(θ,u) ₁ denotes the rate assigned (using any fixed mapping) to user u₁ in θ before transmission of new packets for the pair (u₁, u₂) in interval κ, based on the available coarse estimates ĥ_(u) ₁ [κ], ĥ_(u) ₂ [κ]. The rates corresponding to (10) or (11) can be derived in a similar manner for user u₂.

Note that in deriving the average rate in (10) or (11) we have assumed a simple albeit sub-optimal filtering at the user to suppress the interference from the transmission intended for the co-scheduled user. For completeness, we provide the average rate expressions for the case when the user employs the optimal linear filter and for brevity we only consider the optimistic rate for user u₁. Towards this end, we collect the observations received by user u₁ as

${\begin{bmatrix} {y_{u_{1}}\left\lbrack {\kappa,1} \right\rbrack} \\ {y_{u_{1}}\left\lbrack {k,2} \right\rbrack} \\ {y_{u_{1}}\left\lbrack {k,3} \right\rbrack} \end{bmatrix} = {{{F_{u_{1}}\lbrack k\rbrack}{x_{u_{1}}\lbrack\kappa\rbrack}} + {{{\overset{\sim}{F}}_{u_{1}}\lbrack k\rbrack}{x_{u_{1}}\lbrack\kappa\rbrack}} + \begin{bmatrix} {n_{u_{1}}\left\lbrack {\kappa,1} \right\rbrack} \\ {n_{u_{1}}\left\lbrack {k,2} \right\rbrack} \\ {n_{u_{1}}\left\lbrack {k,3} \right\rbrack} \end{bmatrix}}},{where}$ ${{F_{u_{1}}\lbrack k\rbrack} = \begin{bmatrix} {h_{u_{1}}\lbrack\kappa\rbrack} \\ 0 \\ {{\delta_{u_{1}}\lbrack k\rbrack}{{\overset{\Cup}{h}}_{u_{2}}\lbrack\kappa\rbrack}} \end{bmatrix}},{{{\overset{\sim}{F}}_{u_{1}}\lbrack k\rbrack} = \begin{bmatrix} {h_{u_{1}}\lbrack\kappa\rbrack} \\ {{h_{u_{1}}\lbrack k\rbrack}{z\left\lbrack {k,2} \right\rbrack}{{\overset{\Cup}{h}}_{u_{1}}\lbrack\kappa\rbrack}} \\ 0 \end{bmatrix}}$

For this model, we can determine the instantaneous information rate that can be achieved via optimal filtering using (9) but where Γ_(u) ₁ [k]=I+{tilde over (F)}_(u) ₁ [k]W_(u) ₂ [κ]W_(u) ₂ ^(†)[κ]{tilde over (F)}_(u) ₂ ^(†)[k] and G_(u) ₁ [k]F_(u) ₁ [k]W_(u) ₁ [κ]. The average information rate can then determined as before using (10).

We assume that either conventional coding is employed for all users or rateless coding is employed and accordingly let R_(u) ₁ [k], 1≦i≦2 denote the average rate, henceforth referred to also as the service rate, obtained over interval k. We also note here that the scheduling scheme (policy) is preceded by an initial set-up phase comprising of N(N−1)/2 intervals in which new packets are transmitted successively to each user pair without any accompanying interference resolution packets. For notational convenience, we assume that the scheduling policy starts operating from interval with index 0 using the initial set Γ[0] determined by the set-up phase.

D. Incorporating One-Shot Transmissions and Feedback Delays

We first consider the case of one-shot transmissions. To enable one-shot transmission of packets to any pair in any interval k, we define an action θ in which u₁, u₂ is the pair but κ=φ to capture the fact that the intended transmission is one-shot and hence does not seek to resolve any pending previous transmission. Then, in all three slots of that interval transmission is done as in conventional MU-MIMO relying only on the available current estimates Ĥ[k]. In particular, a transmit precoder [w_(u) ₁ [k], w_(u) ₂ [k]]ε^(M) ^(t) ^(×2) is formed based on {ĥ_(u) ₁ [k], ĥ_(u) ₂ [k]} using a technique such as zero-forcing [8]. Defining I_(u) ₁ ^(one-shot)[k]=log(1+|h_(u) ₁ [k]w_(u) ₁ [k]|²/(1+|h_(u) ₁ [k]w_(u) ₂ [k]|²)), the corresponding average rates obtained for user u₁ (similarly for user u₂) are given by

E[I _(u) ₁ ^(one-shot) [k]|ĥ _(u) ₁ [k],ĥ _(u) ₂ [k]],  (12)

or

r _(θ,u) ₁ (1−Pr(I _(u) ₁ ^(one-shot) [k]<r _(θ,u) ₁ |ĥ _(u) ₁ [k],ĥ _(u) ₂ [k]))

In addition at the end of interval k, we simply set Γ[k+1]=Γ[k] since no pending packets are completed or introduced.

Recall that so far we have assumed that upon choosing action θ for interval k, the finer estimates {hacek over (h)}_(u) ₁ [k], {hacek over (h)}_(u) ₂ [k] are available at the start of interval k+1 (representing a unit delay). In practical systems there can be a delay of several intervals in obtaining such finer estimates. Assuming that these delays are fixed and known in advance, they can be accommodated by expanding the definition of a state. In particular, we can define 4-tuples such as (i, j, κ_(i,j), d_(i,j)) where d_(i,j)≧0 measures the remaining delay after which finer estimates {hacek over (h)}_(i)[κ_(i,j)], {hacek over (h)}_(j)[κ_(i,j)] will be available. At any interval k selecting the action (i, j, κ_(i,j), d_(i,j)) with d_(i,j)>0 (d_(i,j)=0) constrains the interference resolution to be based only on the coarse estimates ĥ_(i)[κ_(i,j)], ĥ_(j)[κ_(i,j)], ĥ_(i)[k], ĥ_(j)[k] (on both coarse and fine estimates H^(csi)((i,j), (κ_(i,j), k))). Upon selecting this action the 4-tuple in Γ[k+1] corresponding to the pair i, j is set to be (i, j, k, d_(i,j)=D_(i,j)) where D_(i,j) is the maximum delay (starting from k+1) after which the finer estimates will be available. If that action is not selected, it is updated in Γ[k+1] as (i, j, κ_(i,j), d_(i,j)=max{0, d_(i,j)−1}). For convenience in exposition the aforementioned two extensions are not considered below.

E. System State and Throughput Region

Define the system state at the start of an interval j as S[j]={Γ[j], Ĥ[j]} and let θ[j] denote the decision (action) taken in that interval. Then, at each interval k, a scheduling policy ψ takes as input all the history up-to interval k, comprising of states {S[j]}_(j=0) ^(k) and all decisions {θ[j]}_(j=0) ^(k-1), to output a decision θ[k]. Under a particular policy ψ, the throughput of the n^(th) user is denoted as

$\begin{matrix} {{r_{n}^{\psi} = {\lim\limits_{J\rightarrow\infty}{\frac{1}{J}{\sum\limits_{t = 0}^{J - 1}{{E\left\lbrack {R_{n}^{\psi}\lbrack t\rbrack} \right\rbrack}{\forall n}}}}}},} & (13) \end{matrix}$

where R_(n) ^(ψ)[t]=R_(n)[t]1(nεθ[t]) and the expectation is over the initial state and the evolution of the states and decisions in the subsequent intervals. Note that in (13) for simplicity we have assumed that the limit exists for the selected policy. In case the limit does not exist, we can consider any sub-sequence for which the limit exists. Let Ψ be the set of all policies. The throughput region that is of interest to us is defined as the closure of the convex hull of the throughput vectors achievable under all policies in Ψ, i.e.,

Λ=CH{r:∃ψεΨs.t.,r=r ^(ψ)},

where CH{•}denotes closure of the convex hull. For each throughput vector r, we obtain a utility value U(r), where U(•) is the non-negative component-wise non-decreasing and concave utility function. For convenience, we also assume that the utility is continuous (and hence uniformly continuous) in the closed hypercube [0, b]^(N) for each finite bεIR₊. The objective then is to maximize the network utility within the throughput region, i.e., max_(r:rεΛ)U(r).

III. OPTIMAL FRAME-BASED SCHEDULING POLICY

In this section, we propose a frame based policy that achieves a utility arbitrarily close to the optimal. In this policy, the time intervals are further grouped into separate frames, where each frame consists of T consecutive intervals. The scheduling decisions in each frame are based on a set of virtual queues that guide the achieved system utility towards optimal, as specified next.

A. Virtual Queue and Virtual Arrival Process

To control the achieved utilities of different users, a virtual queue is maintained for each user, denoted as Q_(n)[k], k=0, 1, . . . & n=1, . . . , N. At the beginning of the τ^(th) frame comprising of intervals {τT, . . . , (τ+1)T−1}, where τε{0, 1, 2, . . . }, the following optimization problem is solved at the scheduler

$\begin{matrix} {{{\max\limits_{r:{0r{r_{\max}1}}}{V \cdot {U(r)}}} - {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}r_{n}}}},} & (14) \end{matrix}$

where r_(max), V are positive constants that can be freely chosen and whose role will be revealed later. We let r*[τ] be the optimal solution to the above problem. Then, the virtual arrival rate for user n is set as r_(n)[τ] in each interval in the τ^(th) frame. A scheduling policy, ψ*_(Q[τT]), is determined and implemented based on the virtual queue length Q [τT] obtained at the beginning of that frame. Letting R_(n) ^(Ψ*) ^(Q[τT]) [k] denote the service rate of user n in each interval k in the τ^(th) frame under this policy, the virtual queue is then updated as

Q _(n) [k+1]=(Q _(n) [k]−R _(n) ^(Ψ*) ^(Q[τT]) [k])⁺ +r* _(n)[τ],  (15)

for all τT≦k≦(τ+1)T−1 and each user n and where (x)⁺=max{0,x} with Q_(n)[0]=0 for all n.

B. State-Action Frequency Approach

We now determine the policy Ψ*_(Q[τT] employed in the τ) ^(th) frame. Notice that while the definition of the system state adopted thus far allows us to compactly describe any policy, one associated drawback is that the number of states becomes countably infinite. Fortunately, there is one aspect that we can exploit. Note that the average rates obtained upon scheduling a pair of users i, j on any interval k depends only on the corresponding coarse and fine channel estimates in interval κ_(i,j) (which we recall denotes the prior recent-most interval over which that pair was scheduled) and the coarse channel estimates in interval k but not on those interval indices. Then, to analyze the average rates offered by any policy, it suffices to define a finite set of states, S, as follows. A state sεS is defined as a particular choice h_(i) ^(p,fine), h_(j) ^(p,fine), h_(i) ^(p,coarse), h_(j) ^(p,coarse), h_(i) ^(c,coarse), h_(j) ^(c,coarse) of coarse and fine channel estimates for each pair i, j, where the superscripts p, c denote past and current estimates, respectively. Consequently there are

${\underset{\_}{}} = {\left( {{_{fine}}^{2}{_{coarse}}^{2}} \right)^{\frac{N{({N - 1})}}{2}}{_{coarse}}^{N}}$

number of states. Note that a state S[k] in the previous definition would map to state sεS which has the choice {hacek over (h)}_(i)[κ_(i,j)], {hacek over (h)}_(j)[κ_(i,j)], ĥ_(i)[κ_(i,j)], ĥ_(j)[κ_(i,j)], ĥ_(i)[k], ĥ_(j)[k] for each pair i, j. A finite set of actions, A, is defined next to be the collection of all possible user pairs so that any aεA uniquely identifies a user pair. Let P(s|s′, a) denote the transition probability, which we note can be determined using (1) and the facts that the finer past estimates of pairs not in a do not change and the current coarse estimates are i.i.d. across intervals. Letting P(A) define the set of all probability distributions on A, any policy can be defined as a mapping which at each interval k takes as input all the history up-to interval k, comprising of states {s[j]}_(j=0) ^(k) and all actions {a[j]}_(j=0) ^(k-1), to output a distribution in P[A] from which the action a[k] can be generated. A stationary policy is one which at any interval k considers only the state s[k] to output a distribution in P[A] and where the output distribution depends only on the state s[k] but not on the interval index k. Under any stationary policy the sequence {s[k]}_(k=0) ^(∞) is a Markov Chain.

With these definitions in hand, we let R_(n)(s, a) denote the achieved transmission rate for user n when action a is taken and the system state is s. Denote the state action frequencies by {x(s, a)} _(sεS, aεA) , where we note that each x(s, a) lies in the unit interval [0, 1] and represents the frequency that the system state is at s and action a is taken. The state action frequencies need to satisfy the normalization equation

${{\sum\limits_{\underset{\_}{s},\underset{\_}{a}}{x\left( {\underset{\_}{s},\underset{\_}{a}} \right)}} = 1},$

and the balance equation

${\sum\limits_{\underset{\_}{a}}{x\left( {\underset{\_}{s},\underset{\_}{a}} \right)}} = {\sum\limits_{{\underset{\_}{s}}^{\prime},\underset{\_}{a}}{{P\left( {\left. \underset{\_}{s} \middle| {\underset{\_}{s}}^{\prime} \right.,\underset{\_}{a}} \right)}{{x\left( {{\underset{\_}{s}}^{\prime},\underset{\_}{a}} \right)}.}}}$

The above two equations form a state-action polytope X and let x denote any vector of state action frequencies lying in X. We next define a rate region as

$\begin{matrix} {\overset{\sim}{\Lambda} = {\left\{ {{{R:\mspace{14mu} R_{n}} = {\sum\limits_{\underset{\_}{s}}{\sum\limits_{\underset{\_}{a}}{{R_{n}\left( {\underset{\_}{s},\underset{\_}{a}} \right)}{x\left( {\underset{\_}{s},\underset{\_}{a}} \right)}}}}},{{{{\forall n}\&}\mspace{11mu} x} \in \underset{\_}{}}} \right\}.}} & (16) \end{matrix}$

Then, given the virtual queue length q=Q[τT] we consider the following linear program (LP),

$\begin{matrix} {{\max\limits_{x}{\sum\limits_{\underset{\_}{s},\underset{\_}{a}}{q^{T}{R\left( {\underset{\_}{s},\underset{\_}{a}} \right)}{x\left( {\underset{\_}{s},\underset{\_}{a}} \right)}}}}{{s.t.\mspace{14mu} x} \in {\underset{\_}{}.}}} & (17) \end{matrix}$

We use x* to denote an optimal solution to the linear program and define R*=[R*₁, . . . , R*_(N)]^(T), where

$\begin{matrix} {{R_{n}^{*} = {\sum\limits_{\underset{\_}{s}}{\sum\limits_{\underset{\_}{a}}{{R_{n}\left( {\underset{\_}{s},\underset{\_}{a}} \right)}{x^{*}\left( {\underset{\_}{s},\underset{\_}{a}} \right)}}}}},{\forall{n.}}} & (18) \end{matrix}$

Using the Bayesian rule, we can identify the corresponding stationary policy Ψ*_(Q[τT]), which at any interval k in the τ^(th) frame first maps the state S[k] to its counterpart sεS. Then, if Σ _(a′)x*(s, a′)>0, it chooses action a using the probabilistic rule

${{P\left( {{pick}\mspace{14mu} \underset{\_}{a}\mspace{14mu} {at}\mspace{14mu} {state}\mspace{14mu} \underset{\_}{s}} \right)} = \frac{x^{*}\left( {\underset{\_}{s},\underset{\_}{a}} \right)}{\sum_{{\underset{\_}{a}}^{\prime}}{x^{*}\left( {\underset{\_}{s},{\underset{\_}{a}}^{\prime}} \right)}}},{\forall{\underset{\_}{a} \in {\underset{\_}{}.}}}$

On the other hand, if Σ _(a′)x*(s, a′)=0, it chooses action a arbitrarily. Let R^(frame)[k], τT≦k≦(τ+1)T−1, denote the service rate vectors obtained under this policy for the intervals in the τ^(th) frame.

We list the following results which can be obtained using those that have been derived before for weakly communicating Markov Decision Processes [10][11].

Lemma 1.

The region Λ defined in (13) is identical to the region {tilde over (Λ)} defined in (16). Further for each frame τ and any given Q[τT], an optimal solution to the LP in (17) can be found for which the corresponding policy Ψ*_(Q[τT]) is also deterministic.

Henceforth, we assume Ψ*_(Q[τT]) to be also deterministic.

Lemma 2.

For arbitrarily fixed δ>0 there exists a large enough frame length T_(o) and constants γ, β such that for each frame length T≧T_(o) and all Q[τT]

$\begin{matrix} {{\Pr\left( {{{{\frac{1}{T}\left( {\sum\limits_{j = 0}^{T - 1}{R^{frame}\left\lbrack {{\tau \; T} + j} \right\rbrack}} \right)} - R^{*}}} > \delta} \middle| {Q\left\lbrack {\tau \; T} \right\rbrack} \right)} \leq {\gamma \; {{\exp \left( {{- \beta}\; T} \right)}.}}} & (19) \end{matrix}$

C. Optimality of the Frame-Based Policy

Define Lyapunov function

${L\left( {Q\left\lbrack {\tau \; T} \right\rbrack} \right)} = {\frac{1}{2}{\sum\limits_{n = 1}^{N}{{Q_{n}^{2}\left\lbrack {\tau \; T} \right\rbrack}.}}}$

Then the T-step average Lyapunov drift is expressed as

${{\Delta_{T}\left( {Q\left\lbrack {\tau \; T} \right\rbrack} \right)} = {\frac{1}{T}{E\left\lbrack {{L\left( {Q\left\lbrack {\left( {\tau + 1} \right)T} \right\rbrack} \right)} - {L\left( {Q\left\lbrack {\tau \; T} \right\rbrack} \right)}} \middle| {Q\left\lbrack {\tau \; T} \right\rbrack} \right\rbrack}}},$

where the expectation is over the initial states at interval τT induced by the policies adopted in the previous frames and the evolution of the states and decisions in the τ^(th) frame under the policy Ψ*_(Q[τT]). Our first result is the following.

Proposition 1.

For any given ε>0, there exists a frame length T_(o) such that for all frame lengths T≧T_(o) the T-step average Lyapunov drift can be bounded as

$\begin{matrix} {{{\Delta_{T}\left( {Q\left\lbrack {\tau \; T} \right\rbrack} \right)} \leq {{BT} - {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}R_{n}}} + {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}{r_{n}^{*}\lbrack\tau\rbrack}}}}},} & (20) \end{matrix}$

where B is a constant and R=[R₁, . . . , R_(N)]^(T) is any vector such that R+ε1 εΛ. Proof. Proved in Appendix-C. □

Consider the ε-interior of Λ, i.e., Λ_(ε)={R:R+ε1εΛ}. Denote r_(ε) ^(opt) as the optimal value of the following optimization problem.

max   U(r) s.t.  r ∈ Λ_(ε); rr_(max)1.

Our main result is the following.

Theorem 1.

For any given ε>0, there exists a T_(o) such that for all frame lengths T≧T_(o)

${\underset{J\rightarrow\infty}{\lim \; \inf \; U}\left( {\frac{1}{J}{\sum\limits_{t = 0}^{J - 1}{E\left\lbrack {R^{frame}\lbrack t\rbrack} \right\rbrack}}} \right)} \geq {{U\left( r_{\varepsilon}^{opt} \right)} - {{BT}/{V.}}}$

Proof. Proof Sketch in Appendix-D. □

Thus, by choosing ε, framelength T and parameters V,r_(max) appropriately, our frame based policy can be made arbitrarily close to optimal.

For comparison we will use the conventional MU-MIMO scheduling described in Section II-A. In addition, we also use the following myopic policy. This policy operates in a manner similar to the frame based policy but with the following important differences. Firstly, the frame-length is set as T=1 so that the arrival rates are computed at the start of each interval and the virtual queues are updated at the end of that interval. Then, at each interval k the current state S[k] is mapped to its image sεS. Considering the queue length q=Q[k], the action â=arg max _(aεA) q^(T)R(s, s) is selected. Clearly, this policy does not consider the transition probabilities (and the possible future evolutions) at all while deciding an action. Nevertheless, as seen in the following section, this policy indeed offers a competitive performance.

IV. SIMULATION RESULTS

We consider a narrowband downlink with four single-antenna users that are served by a BS equipped with four transmit antennas. All users are assumed to experience an identical (large scale fading) pathloss factor δ and thus see an identical average SNR, which models the physical scenario in which all users are equidistant from the BS. Further, we model the small-scale fading seen by each user as Rayleigh fading so the channel response vector of each user is assumed to have i.i.d. CN(0, δ²) elements. Consequently the normalized channel response vector (i.e., channel direction) is isotropically distributed in 4×1. Moreover, the channel response vectors evolve independently across intervals and are independent across users. In the following simulations, each user quantizes its channel norm and channel direction separately. In particular, the channel norm is quantized using a scalar quantizer which for simplicity we assume to be identical for both fine and coarse estimates. On the other hand, to quantize the channel direction, in order to obtain the finer estimate, the quantization codebook used comprises of a set independently generated instances of isotropic vectors in ^(4×1) (a.k.a. random vector codebook), where we note that for large codebook sizes random vector codebooks have been shown to be a good choice for both SU-MIMO and conventional MU-MIMO. The quantization of the channel direction to obtain the coarser estimate is accomplished using Grasmannian codebooks.

Before offering our results, we consider an interval k and decision θ and describe the mapping rules alluded to in Section II-C. We determine a good direction (i.e., unit-norm beamforming vector) for multicasting using the alternating optimization based multicast beamforming design algorithm [12] that takes only the coarse estimates ĥ_(u) ₁ [k] and ĥ_(u) ₂ [k] as inputs and set

$\frac{z\left\lbrack {k,2} \right\rbrack}{{z\left\lbrack {k,2} \right\rbrack}}\mspace{14mu} {and}\mspace{14mu} \frac{z\left\lbrack {k,3} \right\rbrack}{{z\left\lbrack {k,3} \right\rbrack}}$

to be equal to this direction. The precoding matrix W_(u) ₁ [κ] is obtained by extending the naive zero-forcing design of conventional MU-MIMO to the model in (8). In particular at interval κ the BS naively assumes that coarse estimates ĥ_(u) ₁ [κ], ĥ_(u) ₂ [κ] it has are indeed equal to their respective exact channels (and hence their respective finer estimates). Then, at any future interval k (the knowledge of k is not assumed during interval κ) when pair (u₁, u₂) is next scheduled, under the naive assumption (8) would reduce to

$\begin{matrix} {{{{y_{u_{1}}\left\lbrack {\kappa,1} \right\rbrack} - \frac{y_{u_{1}}\left\lbrack {k,2} \right\rbrack}{{h_{u_{1}}\lbrack k\rbrack}{z\left\lbrack {k,2} \right\rbrack}}} = {{{{\hat{h}}_{u_{1}}\lbrack\kappa\rbrack}{x_{u_{1}}\lbrack\kappa\rbrack}} + {n_{u_{1}}\left\lbrack {\kappa,1} \right\rbrack} - \frac{n_{u_{1}}\left\lbrack {k,2} \right\rbrack}{\left( {{h_{u_{1}}\lbrack k\rbrack}{z\left\lbrack {k,2} \right\rbrack}} \right)}}},\mspace{20mu} {\frac{y_{u_{1}}\left\lbrack {k,3} \right\rbrack}{\left( {{h_{u_{1}}\lbrack k\rbrack}{z\left\lbrack {k,3} \right\rbrack}} \right)} = {{{{\hat{h}}_{u_{2}}\lbrack\kappa\rbrack}{x_{u_{1}}\lbrack\kappa\rbrack}} + {\frac{n_{u_{1}}\left\lbrack {k,3} \right\rbrack}{\left( {{h_{u_{1}}\lbrack k\rbrack}{z\left\lbrack {k,3} \right\rbrack}} \right)}.}}}} & (21) \end{matrix}$

To remove dependence on k, all noise covariances are averaged so that (21) reduces to a point-to-point MIMO channel with channel matrix [ĥ_(u) ₁ [κ]; ĥ_(u) ₂ [κ]] and noise covariance diag{1+E[1/|h_(u) ₁ [k]z[k, 2]|²], E[1/|h_(u) ₁ [k]z[k, 3]|²]}. Notice however that due to the power constraints these expected values in turn depend on the choice of precoders W_(u) ₁ [κ], W_(u) ₂ [κ]. As a further simplification, we fix these expected values to be suitable scalars which are determined offline. The precoder W_(u) ₁ [κ] can now be obtained using the standard point-to-point MIMO precoder design algorithm [7]. The precoder W_(u) ₂ [κ] is computed in an analogous manner. Finally, the norms of the precoding vectors are fixed as

${{z\left\lbrack {k,2} \right\rbrack}} = \frac{\sqrt{P}}{{{{\overset{\Cup}{h}}_{u_{1}}\lbrack\kappa\rbrack}{W_{u_{2}}\lbrack\kappa\rbrack}}}$ and ${{z\left\lbrack {k,3} \right\rbrack}} = {\frac{\sqrt{P}}{{{{\overset{\Cup}{h}}_{u_{2}}\lbrack\kappa\rbrack}{W_{u_{1}}\lbrack\kappa\rbrack}}}.}$

In FIG. 4 we compare the sum rate utility obtained using conventional MU-MIMO that only uses the current CSI with that obtained using the myopic scheduling that uses only the delayed CSI (EMAT with delayed) and the myopic scheduling that uses the hybrid CSI (EMAT with hybrid), where for the latter two schemes the average rates are computed assuming both the sub-optimal and the optimal filtering. In all cases the channel norms were assumed to be perfectly quantized whereas a 2-bit coarse codebook and 5-bit fine codebook were employed to quantize the channel directions, respectively. As seen from the figure, the conventional MU-MIMO gets interference limited and the policy using the finer albeit delayed CSI offers significant gains, which are further improved by utilizing the hybrid CSI. The improvement is more marked upon using optimal filtering.

In FIG. 5 we consider the same setup as in the previous figure but now compare the sum rate utility obtained using the myopic scheduling that uses the hybrid CSI along with the optimal filtering, for different codebook sizes. In particular, in all cases the channel norms were assumed to be perfectly quantized and a 2-bit coarse codebook was employed. Four different codebook sizes (5, 10, 12, and 16 bits) for the fine codebook were employed and compared along with the case when perfect delayed CSI is available to the BS. As seen from the figure, to capture the promised multiplexing gains the codebook sizes must scale sufficiently fast with SNR. We note here that the MAT and EMAT schemes have been designed with the goal of achieving degree of freedom improvements, where aligning (confining) interference to a low dimensional subspace is the paramount concern. The substantial gap compared to the perfect delayed CSI performance observed at a fixed (finite) SNR can be alleviated via proper precoder design that is optimized for a finite SNR. We emphasize that the precoder optimization we undertook to produce these set of results were limited and adhered fully to the EMAT framework.

We also compared the sum rates obtained using our proposed policy and the myopic one, respectively, for a simpler examples having fewer number of states. We found that for well designed quantization codebooks, the myopic policy performs very close to the optimal frame based policy. This observation coupled with the fact that the complexity of the myopic policy scales much more benignly with the system size, makes it well suited to practical implementation.

V. CONCLUSIONS

We considered the DL MU-MIMO scheduling problem with hybrid CSIT and proposed an optimal frame-based joint scheduling and feedback approach. There are two important and interesting issues that are the focus of our current research. The foremost one pertains to the exceedingly large number of states that are needed to accommodate practical system sizes which makes implementation of the frame based policy challenging even upon using commercial LP solvers. While the sparse nature of these linear programs can indeed be exploited, an efficient and significant reduction in the number states is necessary. The second issue is the choice of the precoding matrices and vectors. Recall that in this work we have assumed the choice of precoders to be pre-determined and fixed for each (state,action) pair. To fully exploit the precoding gains and the availability of “precoded pilots” in future networks, we should relax this restriction. Finally, we remark that incorporating practical considerations such as delay constraints on scheduling are other important open issues.

REFERENCES

-   [1] D. Gesbert, M. Kountouris, R. W. Heath Jr, C. Chae and T.     Salzer, “From single user to multiuser communications: shifting the     MIMO paradigm,” IEEE Signal Proc. Mag., October, 2007. -   [2] G. Caire, N. Jindal, M. Kobayashi and N. Ravindran, “Multiuser     MIMO Achievable Rates With Downlink Training and Channel State     Feedback,” IEEE Transactions on Information Theory, June, 2010. -   [3] A. Adhikary, H. C. Papadopoulos, S. A. Ramprashad and Giuseppe     Caire, “Multi-User MIMO with outdated CSI: Training, Feedback and     Scheduling,” Allerton, October, 2011. -   [4] H. Shirani-Mehr, G. Caire and M. Neely, “MIMO Downlink     Scheduling with Non-Perfect Channel State Knowledge,” IEEE Trans. on     Comm., July, 2010. -   [5] M. A. Maddah-Ali and D. Tse, “Completely Stale Transmitter     Channel State Information is Still Very Useful”, IEEE Trans. on     Information Theory, July, 2012. -   [6] M. Kobayashi, S. Yang, D. Gesbert and Xinping Yi, “On the     Degrees of Freedom of time correlated MISO broadcast channel with     delayed CSIT,” IEEE Trans. on Information Theory, January, 2013. -   [7] D. Tse and P. Viswanath, “Fundamentals of wireless     communication,” Cambridge University Press, 2005. -   [8] A. Wiesel, Y. C. Eldar and S. Shamai, “Zero-forcing precoding     and generalized inverses,” IEEE Trans. Signal Process., September     2008. -   [9] H. Shirani-Mehr, H. Papadopoulos, S. Ramprashad and G. Caire,     “Joint scheduling and ARQ for MU-MIMO downlink in the presence of     inter-cell interference,” IEEE Trans. on Comm., October, 2011. -   [10] E. Altman, “Constrained Markov Decision Processes”, Chapman &     Hall, 1999. -   [11] K. Jagannathan, S. Mannor, I. Menache and E. Modiano, “A state     action frequency approach to throughput maximization over uncertain     wireless channels,” IEEE INFOCOM, Shanghai, China, April 2011. -   [12] H. Zhu, N. Prasad and S. Rangarajan, “Precoder Design for     Physical Layer Multicasting,” IEEE Trans. Sig. Proc., November 2012.

A. Extended MAT Scheme

The MAT scheme [5] is an interesting tool that has been recently proposed to tackle the problem where no channel state estimates for the current interval are available at the BS but perfect albeit delayed CSI is available to the BS. The scheme uses such completely outdated CSIT but still achieves system degrees of freedom equal to 4/3. We recall that in our context MU-MIMO with perfect and current CSIT will achieve 2 system degrees of freedom while single-user transmission will achieve only one degree of freedom. In this paper, we will build upon the following extended MAT scheme [6] that achieves the same system degrees of freedom as the original MAT scheme.

The scheme proceeds as follows. Time is divided into units referred to as rounds. Two messages u and v are to be transmitted, each destined to users i and j respectively, where u and v are M_(t)×1 vectors. The three rounds are introduced next.

Round 1: The transmitted signal is x[1]=u+v, the corresponding received signal at user i and j is denoted by y_(i)[1] and y_(j)[1], where

y _(i)[1]=h _(i)[1](u+v)+n _(i)[1],  (22)

y _(j)[1]=h _(j)[1](u+v)+n _(j)[1],  (23)

where n_(i)[1] denotes the additive noise at user i in round 1 and h_(i)[1]ε^(1×M) ^(t) denotes the channel response vector seen by user i in Round 1.

Round 2: The transmitted signal is x[2]=[h_(i)[1]v; 0], the received signal for user i and j is respectively

y _(i)[2]=h _(i,1)[2]·(h _(i)[1]v)+n _(i)[2],  (24)

y _(j)[2]=h _(j,1)[2]·(h _(i)[1]v)+n _(j)[2],  (25)

where h_(i,1)[2] denotes the channel coefficient modeling the propagation environment between user i and the first transmit antenna at the BS during round 2.

Round 3: The transmitted signal is x₃=[h_(j)[1]u; 0], the received signal for user i and j is respectively

y _(i)[3]=h _(i,1)[3]·(h _(j)[1]u)+n _(i)[3],(26)

y _(j)[3]=h _(j,1)[3]·(h _(j)[1]u)+n _(j)[3],(27)

It is assumed that the channel state vectors h_(i)[1], h_(i)[2], h_(i)[3] are estimated perfectly by user i at the start of each respective round. Similarly for user j. In addition, the BS is assumed to know channel state vectors h_(i)[l], h_(j)[l] perfectly but only after round l for l=1, 2, 3. Further, user i is also conveyed the channel vector h_(j)[1] and user j is conveyed the channel vector h_(i)[1] before the start of round 3, via feed-forward signaling.

Therefore, after Round 3, the i^(th) user can decode message u using (22), (24) and (26) as per the following,

${{{y_{i}\lbrack 1\rbrack} - \frac{y_{i}\lbrack 2\rbrack}{h_{i,1}\lbrack 2\rbrack}} = {{{h_{i}\lbrack 1\rbrack}u} + {n_{i}\lbrack 1\rbrack} - \frac{n_{i}\lbrack 2\rbrack}{h_{i,1}\lbrack 2\rbrack}}},{{y_{i}\lbrack 3\rbrack} = {{{{h_{i,1}\lbrack 3\rbrack} \cdot {h_{j}\lbrack 1\rbrack}}u} + {{n_{i}\lbrack 3\rbrack}.}}}$

Similarly, after Round 3, the j^(th) user can also decode message v. Notice that since the effective received observations seen by each user after three rounds can be modeled as the outputs of two linearly independent equations, each user can achieve two degrees of freedom over three rounds to attain system degrees of freedom equal to 4/3.

B. Practical Variations

We now present another variation of the extended MAT scheme that is more amenable to practical implementation. Notice that the scheme presented above involves estimation, quantization and signaling of channel response vectors. In addition, the precoders also need to be signaled to the scheduled users to enable decoding. This can be accomplished by restricting each precoder to lie in a codebook of possible precoders (known in advance to the BS and all users) and then sending an index that identifies a precoder in that codebook. Another variant that instead exploits precoded pilots is described next. At (or prior to) the start of an interval k, all users will obtain an estimate of their respective channel vectors for the k^(th) interval (using un-precoded pilots transmitted by the BS), quantize their respective estimates and feedback the respective quantized estimates to the BS. This operation enables the BS to construct the set Ĥ[k]. Then, in interval k, during slot 1 the BS will transmit the signal x_(θ) ₁ [k]+x_(θ) ₂ [k] where as before x_(θ) ₁ [k]=W_(θ) ₁ [k]s_(θ) ₁ [k], x_(θ) ₂ [k]=W_(θ) ₂ [k]s_(θ) ₂ [k], and s_(θ) ₂ [k], s_(θ) ₂ [k] denote the 2×1 symbol vectors containing symbols formed using the new packets intended for users θ₁ and θ₂, respectively, with E[s_(θ) _(i) [k]s_(θ) _(i) ^(†)[k]]=I, iε{1, 2}. In addition, the BS will transmit pilots precoded using the precoder [W_(θ) ₁ [k], W_(θ) ₂ [k]] to the scheduled pair θ₁, θ₂. These pilots enable user θ_(i) to estimate h_(θ) _(i) [k][W_(θ) ₁ [k], W_(θ) ₂ [k]]. User θ₁ can then quantize part of its estimate (using an appropriate quantization codebook) to obtain

User θ₁ then feeds back

to the BS. Similarly user θ₂ feeds back

to the BS.

The BS in slot 2 will transmit the signal vector z[k, 2]

s_(θ) ₂ [θ₃]. Note here that

is available to the BS from the feedback from user θ₁ after the interval θ₃. In addition the BS will transmit a pilot precoded by z[k, 2] which allows user θ_(i), i=1, 2 to estimate h_(θ)[k]z[k, 2].

Similarly in slot 3, the BS will transmit the signal vector z[k, 3]

s_(θ) ₁ [θ₃]. In addition the BS will transmit a pilot precoded by z[k, 3] which allows user θ_(i), i=1, 2 to estimate h_(θ) _(i) [k]z[k, 3] These transmissions along with the cross channel feed-forward of

allows for decoding of desired message at user θ₁. Similar observation holds for user θ₂ as well.

An aspect in the frame based scheduling policy that can be improved is described next. Recall that we have so far assumed that the precoders are obtained as outputs of arbitrary but fixed mappings when given

$\begin{matrix} {{Q_{n}^{2}\left\lbrack {{\left( {\tau + 1} \right)T} - 1} \right\rbrack} \leq {\left( {{Q_{n}\left\lbrack {\tau \; T} \right\rbrack} - {\sum\limits_{j = 0}^{T - 1}{R_{n}^{frame}\left\lbrack {{\tau \; T} + j} \right\rbrack}}} \right)^{2} + \left( {{Tr}_{n}^{*}\lbrack\tau\rbrack} \right)^{2} + {2{{Tr}_{n}^{*}\lbrack\tau\rbrack}\left( {{Q_{n}\left\lbrack {\tau \; T} \right\rbrack} - {\sum\limits_{j = 0}^{T - 1}{R_{n}^{frame}\left\lbrack {{\tau \; T} + j} \right\rbrack}}} \right)^{+}}}} & (28) \\ {{{Q_{n}^{2}\left\lbrack {\left( {\tau + 1} \right)T} \right\rbrack} - \left( {Q_{n}\left\lbrack {\tau \; T} \right\rbrack} \right)^{2}} \leq {\left( {\sum\limits_{j = 0}^{T - 1}{R_{n}^{frame}\left\lbrack {{\tau \; T} + j} \right\rbrack}} \right)^{2} + \left( {{Tr}_{n}^{*}\lbrack\tau\rbrack} \right)^{2} - {2{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}{\left( {{\sum\limits_{j = 0}^{T - 1}{R_{n}^{frame}\left\lbrack {{\tau \; T} + j} \right\rbrack}} - {{Tr}_{n}^{*}\lbrack\tau\rbrack}} \right).}}}} & (29) \end{matrix}$

the coarse or both fine and coarse quantized estimates as inputs. One improvement is to change these mappings in every frame. In particular, at the τ^(th) frame given we can design suitable mapping functions that also depend on the queue sizes Q[τT]. Indeed separate mapping functions can be designed for each pair at the start of each frame that also consider the queue sizes for that frame.

C. Proof of Proposition 1

To bound the Lyapunov drift we proceed along the lines of [11] and first note that

${{Q_{n}\left\lbrack {\left( {\tau + 1} \right)T} \right\rbrack} \leq {\left( {{Q_{n}\left\lbrack {\tau \; T} \right\rbrack} - {\sum\limits_{j = 0}^{T - 1}{R_{n}^{frame}\left\lbrack {{\tau \; T} + j} \right\rbrack}}} \right)^{+} + {{Tr}_{n}^{*}\lbrack\tau\rbrack}}},$

so that (28) holds, which then yields the bound in (29). Using (29) we can bound the T-step Lyapunov drift as in (30). Then, since R_(n) ^(frame)[j], ∀n, j can be bounded above by a constant and r*_(n)[τ]≦r_(max), ∀n, τ, we obtain the bound

$\begin{matrix} {{\Delta_{T}\left( {Q\left\lbrack {\tau \; T} \right\rbrack} \right)} \leq {{BT} + {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}{r_{n}^{*}\lbrack\tau\rbrack}}} - {E\left\lbrack {{\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}\frac{1}{T}\left( {\sum\limits_{j = 0}^{T - 1}{R_{n}^{frame}\left\lbrack {{\tau \; T} + j} \right\rbrack}} \right)}}{Q\left\lbrack {\tau \; T} \right\rbrack}} \right\rbrack}}} & (31) \end{matrix}$

where B is an appropriate large enough constant. The RHS in (31) can be manipulated to obtain

$\begin{matrix} {{{\Delta_{T}\left( {Q\left\lbrack {\tau \; T} \right\rbrack} \right)} \leq {{BT} + {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}{r_{n}^{*}\lbrack\tau\rbrack}}} - {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}R_{n}^{*}}} - {E\left\lbrack {{\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}\left( {{\frac{1}{T}\left( {\sum\limits_{j = 0}^{T - 1}{R_{n}^{frame}\left\lbrack {{\tau \; T} + j} \right\rbrack}} \right)} - R_{n}^{*}} \right)}}{Q\left\lbrack {\tau \; T} \right\rbrack}} \right\rbrack}}}{{\Delta_{T}\left( {Q\left\lbrack {\tau \; T} \right\rbrack} \right)} \leq {{\frac{1}{2T}{E\left\lbrack {{{\sum\limits_{n = 1}^{N}\left( {\sum\limits_{j = 0}^{T - 1}{R_{n}^{frame}\left\lbrack {{\tau \; T} + j} \right\rbrack}} \right)^{2}} + {\sum\limits_{n = 1}^{N}\left( {{Tr}_{n}^{*}\lbrack\tau\rbrack} \right)^{2}}}{Q\left\lbrack {\tau \; T} \right\rbrack}} \right\rbrack}} - {\frac{1}{T}{E\left\lbrack {{\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}\left( {{\sum\limits_{j = 0}^{T - 1}{R_{n}^{frame}\left\lbrack {{\tau \; T} + j} \right\rbrack}} - {{Tr}_{n}^{*}\lbrack\tau\rbrack}} \right)}}{Q\left\lbrack {\tau \; T} \right\rbrack}} \right\rbrack}}}}} & (30) \end{matrix}$

where R*=[R*₁, . . . , R*_(N)]^(T) was defined in (18). Using the Cauchy-Schwartz inequality along with the fact that Σ_(n=1) ^(N)Q_(n)[τT]≧√{square root over (Σ_(n=1) ^(N)Q_(n) ²[τT])} we can then further upper bound

$\begin{matrix} {{\Delta_{T}\left( {Q\left\lbrack {\tau \; T} \right\rbrack} \right)} \leq {{BT} + {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}{r_{n}^{*}\lbrack\tau\rbrack}}} - {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}R_{n}^{*}}} + {\left( {\sum\limits_{n = 1}^{N}{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}} \right){E\left\lbrack {{{{\frac{1}{T}\left( {\sum\limits_{j = 0}^{T - 1}{R^{frame}\left\lbrack {{\tau \; T} + j} \right\rbrack}} \right)} - R^{*}}}{Q\left\lbrack {\tau \; T} \right\rbrack}} \right\rbrack}}}} & (32) \end{matrix}$

Invoking Lemma 2 along with the fact that R* is also bounded above, we can deduce that by choosing a large enough frame length we can ensure that

$\begin{matrix} {{E\left\lbrack {{{{\frac{1}{T}\left( {\sum\limits_{j = 0}^{T - 1}{R^{frame}\left\lbrack {{\tau \; T} + j} \right\rbrack}} \right)} - R^{*}}}{Q\left\lbrack {\tau \; T} \right\rbrack}} \right\rbrack} \leq \varepsilon} & (33) \end{matrix}$

which when used in (32) yields

$\begin{matrix} {{\Delta_{T}\left( {Q\left\lbrack {\tau \; T} \right\rbrack} \right)} \leq {{BT} + {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}{r_{n}^{*}\lbrack\tau\rbrack}}} - {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}R_{n}^{*}}} + {\varepsilon {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}.}}}}} & (34) \end{matrix}$

Recall that any vector R in the ε-interior of Λ satisfies R

{tilde over (R)}−ε1 for some {tilde over (R)}εΛ. Then, appealing to the fact that Σ_(n=1) ^(N)Q_(n)[τT]R*_(n) is the optimal solution for the LP in (17) together with Lemma 1, we have that

${{\Delta_{T}\left( {Q\left\lbrack {\tau \; T} \right\rbrack} \right)} \leq {{BT} + {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}{r_{n}^{*}\lbrack\tau\rbrack}}} - {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}\left( {{\overset{\sim}{R}}_{n} - \varepsilon} \right)}}}},$

from which (20) follows.

D. Proof Sketch of Theorem 1

We leverage some of the techniques used in [4] but we emphasize that the policies considered in [4] were not frame based and Markov decision processes were not employed there. Using the result in (20) (after assuming a large enough framelength) and subtracting the term VU(r*[τ]) from both sides, we first obtain

$\begin{matrix} {{{\Delta_{T}\left( {Q\left\lbrack {\tau \; T} \right\rbrack} \right)} - {{VU}\left( {r^{*}\lbrack\tau\rbrack} \right)}} \leq {{BT} - {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}R_{n}}} + {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}{r_{n}^{*}\lbrack\tau\rbrack}}} - {{{VU}\left( {r^{*}\lbrack\tau\rbrack} \right)}.}}} & (35) \end{matrix}$

Then recalling that r*[τ] is the optimal solution to (14) we have that for any v:0

v

r_(max)1

$\begin{matrix} {{{\Delta_{T}\left( {Q\left\lbrack {\tau \; T} \right\rbrack} \right)} - {{VU}\left( {r^{*}\lbrack\tau\rbrack} \right)}} \leq {{BT} - {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}R_{n}}} + {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}v_{n}}} - {{{VU}(v)}.}}} & (36) \end{matrix}$

Averaging both sides of (36) with respect to Q[τT], we obtain

$\begin{matrix} {{{\frac{1}{T}{E\left\lbrack {L\left( {Q\left\lbrack {\left( {\tau + 1} \right)T} \right\rbrack} \right)} \right\rbrack}} - {\frac{1}{T}{E\left\lbrack {L\left( {Q\left\lbrack {\tau \; T} \right\rbrack} \right)} \right\rbrack}} - {{VE}\left\lbrack {U\left( {r^{*}\lbrack\tau\rbrack} \right)} \right\rbrack}} \leq {{BT} - {\sum\limits_{n = 1}^{N}{{E\left\lbrack {Q_{n}\left\lbrack {\tau \; T} \right\rbrack} \right\rbrack}R_{n}}} + {\sum\limits_{n = 1}^{N}{{E\left\lbrack {Q_{n}\left\lbrack {\tau \; T} \right\rbrack} \right\rbrack}v_{n}}} - {{{VU}(v)}.}}} & (37) \end{matrix}$

Noting that Q_(n)[0]=0, Λ n and summing (37) over τ=0, 1, . . . , t−1 we get

${{\frac{1}{T}{E\left\lbrack {L\left( {Q\lbrack{tT}\rbrack} \right)} \right\rbrack}} - {\sum\limits_{\tau = 0}^{t - 1}{{VE}\left\lbrack {U\left( {r^{*}\lbrack\tau\rbrack} \right)} \right\rbrack}}} \leq {{BTt} - {\sum\limits_{n = 1}^{N}{\sum\limits_{\tau = 0}^{t - 1}{{E\left\lbrack {Q_{n}\left\lbrack {\tau \; T} \right\rbrack} \right\rbrack}R_{n}}}} + {\sum\limits_{n = 1}^{N}{\sum\limits_{\tau = 0}^{t - 1}{{E\left\lbrack {Q_{n}\left\lbrack {\tau \; T} \right\rbrack} \right\rbrack}v_{n}}}} - {{tVU}(v)}}$

which when combined with the fact that

${\frac{1}{T}{E\left\lbrack {L\left( {Q\lbrack{tT}\rbrack} \right)} \right\rbrack}} \geq {0\mspace{14mu} {yields}}$

$\begin{matrix} {{\frac{1}{t}{\sum\limits_{n = 1}^{N}{\sum\limits_{\tau = 0}^{t - 1}{{E\left\lbrack {Q_{n}\left\lbrack {\tau \; T} \right\rbrack} \right\rbrack}\left( {R_{n} - v_{n}} \right)}}}} \leq {{BT} + {\frac{1}{t}{\sum\limits_{\tau = 0}^{t - 1}{{VE}\left\lbrack {U\left( {r^{*}\lbrack\tau\rbrack} \right)} \right\rbrack}}} - {{{VU}(v)}.}}} & (38) \end{matrix}$

Next, choosing any RεΛ_(ε) and v: 0

v=R−δ1 and v

r_(max)1 for some δ>0, and substituting in (38), we get that

${{\frac{1}{t}{\sum\limits_{n = 1}^{N}{\sum\limits_{\tau = 0}^{t - 1}{\delta \; {E\left\lbrack {Q_{n}\left\lbrack {\tau \; T} \right\rbrack} \right\rbrack}}}}} \leq {{BT} + {\frac{1}{t}{\sum\limits_{\tau = 0}^{t - 1}{{VE}\left\lbrack {U\left( {r^{*}\lbrack\tau\rbrack} \right)} \right\rbrack}}} - {{VY}(v)}}},$

which using the component wise non-increasing property of the utility function yields

$\begin{matrix} {{{\frac{1}{t}{\sum\limits_{n = 1}^{N}{\sum\limits_{\tau = 0}^{t - 1}{\delta \; {E\left\lbrack {Q_{n}\left\lbrack {\tau \; T} \right\rbrack} \right\rbrack}}}}} \leq {{BT} + {{VU}\left( {r_{\max}1} \right)} - {{VU}(v)}}},{\forall{t.}}} & (39) \end{matrix}$

Then, since Q_(n)[τT+j]≦Q_(n)[τT]+jr_(max), ∀n, j and U(v)≧θ>−∞ for some constant θ, from (39) we can conclude that

$\frac{1}{J}{\sum\limits_{n = 1}^{N}{\sum\limits_{j = 0}^{J - 1}{\delta \; {E\left\lbrack {Q_{n}\lbrack j\rbrack} \right\rbrack}}}}$

is also bounded above by a constant for all J, which proves that all virtual queues are strongly stable under the frame based policy. Letting A_(n), [τT+j]r*[τ], ∀0≦j≦T−1, τ=0, 1, . . . , denote the per-slot virtual arrival rate, from strong stability of each virtual queue, uniformly bounded arrival rates and uniform continuity of the utility function, we can deduce that

$\begin{matrix} {{\lim \inf\limits_{J->\infty}{U\left( {\frac{1}{J}{\sum\limits_{j = 0}^{J - 1}{E\left\lbrack {A\lbrack j\rbrack} \right\rbrack}}} \right)}} \leq {\lim \inf\limits_{J->\infty}{{U\left( {\frac{1}{J}{\sum\limits_{j = 0}^{J - 1}{E\left\lbrack {R^{frame}\lbrack j\rbrack} \right\rbrack}}} \right)}.}}} & (40) \end{matrix}$

Finally, setting R=v=r_(ε) ^(opt) in (38), we obtain

$\begin{matrix} {{{\frac{1}{t}{\sum\limits_{\tau = 0}^{t - 1}{{VE}\left\lbrack {U\left( {r^{*}\lbrack\tau\rbrack} \right)} \right\rbrack}}} \geq {{{VU}\left( r_{\varepsilon}^{opt} \right)} - {BT}}},} & (41) \end{matrix}$

which upon invoking the concavity of the utility function and the linearity of the expectation operator yields

$\begin{matrix} {{U\left( {\frac{1}{t}{\sum\limits_{\tau = 0}^{t - 1}{E\left\lbrack {r^{*}\lbrack\tau\rbrack} \right\rbrack}}} \right)} \geq {{U\left( r_{\varepsilon}^{opt} \right)} - {{BT}/{V.}}}} & (42) \end{matrix}$

Notice then that due to the uniform continuity of the utility function,

$\lim \mspace{14mu} \inf_{t->\infty}{U\left( {\frac{1}{t}{\sum\limits_{\tau = 0}^{t - 1}{E\left\lbrack {r^{*}\lbrack\tau\rbrack} \right\rbrack}}} \right)}$

is equal to

$\begin{matrix} {{\lim \inf\limits_{J->\infty}{U\left( {\frac{1}{TJ}{\sum\limits_{j = 0}^{{TJ} - 1}{E\left\lbrack {A\lbrack j\rbrack} \right\rbrack}}} \right)}} = {\lim \inf\limits_{J->\infty}{U\left( {\frac{1}{J}{\sum\limits_{j = 0}^{J - 1}{E\left\lbrack {A\lbrack j\rbrack} \right\rbrack}}} \right)}}} & (43) \end{matrix}$

which when used in (42) yields

$\begin{matrix} {{\lim \inf\limits_{J->\infty}{U\left( {\frac{1}{J}{\sum\limits_{j = 0}^{J - 1}{E\left\lbrack {A\lbrack j\rbrack} \right\rbrack}}} \right)}} \geq {{U\left( r_{\varepsilon}^{opt} \right)} - {{BT}/{V.}}}} & (44) \end{matrix}$

Using (44) and (40) yields the desired result.

From the foregoing it can be appreciated that the features and benefits of the invention applied in the optical communication network, this invention allows the network owners to have better utilization of their hardware resources (including both the communication equipment, and the bandwidth resource), which will lead to better profitability over their asset. It also boosts the network performance since the operation can be done on a finer granularity and better flexibility.

The foregoing is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that those skilled in the art may implement various modifications without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention. 

1. A method comprising steps: determining an optimal network utilization maximization in a communication system with wireless links in which current and coarse channel state information CSI is available from all users, along with a limited amount of fine CSI by way of a frame based scheduling and feedback under which a virtual queue is associated with each user with virtual rates being determined at start of each frame and policy of each frame being determined by solving a decision process, said determining comprising: i—initializing an index indicative of an interval, a system state and control parameters; and 0 ii—if the interval is not the start of a frame then: 1) determining action for current interval using system state for current interval and the state action frequencies computed for the current frame; 2) performing action for current interval; 3) updating virtual queues and incrementing the interval index; 4) updating system state for the current interval. wherein the above steps are carried out by computer processing.
 2. The method of claim 1, wherein iii—if the interval is the start of a frame then: 5) sampling of virtual queues; 6) determining virtual arrival rates using sampled virtual queue rates; 7) using sampled virtual queue to determine state-action frequencies by a linear process;
 3. The method of claim 1, wherein the system state at any interval consists of coarse channel estimates reported by all users for that interval, the system state includes for each possible pair of users, the fine and coarse channel estimates from both users in that pair for the recent-most prior interval on which that user pair was scheduled and the index of that interval.
 4. The method for claim 1, wherein step ii-2), performing action for current interval, comprises the action being performed over three orthogonal slots in an interval, with a first slot of new packets being transmitted to a selected user pair whereas in slots 2 and 3 interference resolution is performed for the selected pair by sending interference resolving packets for a previous pending transmission involving the selected pair.
 5. The method of claim 4, wherein the action being performed over three orthogonal slots comprises, the overall transmit precoding matrix being denoted by the matrix [W_(u) ₁ [k], W_(u) ₂ [k]] where W_(u) ₁ [k], W_(u) ₂ [k]εC^(M) ^(t) ^(×2), letting x_(u) ₁ [k]=W_(u) ₁ [k]s_(u) ₁ [k], x_(u) ₂ [k]=W_(u) ₂ [k]s_(u) ₂ [k], where s_(u) ₁ [k], s_(u) ₂ [k] denote 2×1 symbol vectors containing symbols formed using new packets intended for user u_(l) and u₂, respectively, the signal transmitted in the first slot being x_(u) ₁ [k]+x_(u) ₂ [k] and the received signals at both users are y _(u) ₁ [k,1]=h _(u) ₁ [k](x _(u) ₁ [k]+x _(u) ₂ [k])+n _(u) ₁ [k,1],  (2) y _(u) ₂ [k,1]=h _(u) ₂ [k](x _(u) ₁ [k]+x _(u) ₂ [k])+n _(u) ₂ [k,1].  (3) with allocated transmission power for scheduled user u_(i) is the norm ∥W_(u) ₂ [k]∥².
 6. The method of claim 1, wherein step ii—3), updating of the virtual queues, comprises using Q _(n) [k+1]=(Q _(n) [k]−R _(n) ^(Ψ*) ^(Q[τT]) [k])⁺ +r* _(n)[τ] where the virtual queue maintained for each user is denoted as Q_(n) [k], k=0, 1, . . . & n=1, . . . , N, virtual arrival rate for a user n is set at r*_(n)[τ] in each interval in the τ^(th) frame, and R_(n) ^(ψ) ^(Q[τT]) [k] denotes the service rate of user n in each time interval k and τ^(th) frame.
 7. The method of claim 1, wherein step ii—4) updating system state for current interval, comprises updating coarse channel state information CSI reported by all users for the current interval and a tuple corresponding to the user pair scheduled in the previous interval is updated using the fine and coarse CSI received from both users in that pair for the previous interval and respective fine channel estimates are fed forward to both the users in that pair.
 8. The method of claim 2, wherein step iii—6, determining virtual arrival rates, comprises using sampled virtual queues in ${{\max\limits_{r:{0r{r_{\max}1}}}{V \cdot {U(r)}}} - {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}r_{n}}}},$ where the virtual queue maintained for each user is denoted as Q_(n)[k], k=0, 1, . . . & n=1, . . . , N, virtual arrival rate for a user n is set at r*_(n) [τ] in each interval in the τ^(th) frame, V is a positive constant, and U (r) is a utility value.
 9. The method of claim 1, wherein, step iii—7), determining action state frequencies comprises solving linear program equation $\max\limits_{x}{\sum\limits_{\underset{\_}{s},\underset{\_}{a}}{q^{T}{R\left( {\underset{\_}{s},\underset{\_}{a}} \right)}{x\left( {\underset{\_}{s},\underset{\_}{a}} \right)}}}$ ${s.t.\mspace{14mu} x} \in {\underset{\_}{X}.}$ where R_(n) (s, a) denotes the achieved transmission rate for user n when action a is taken and the system state is s, state action frequencies are denoted by {x(s, a)}sεS,aεA, where each x(s, a) lies in the unit interval [0, 1] and represents the frequency that the system state is at s and action a is taken. The state action frequencies need to satisfy the normalization equation, q is a virtual queue length, X is a formed state action prototype and x denoted any vector of state action frequencies lying in X.
 10. A system comprising: a communication system with wireless links in which current and coarse channel state information CSI is available from all users, along with a limited amount of fine CSI by way of a frame based scheduling and feedback under which a virtual queue is associated with each user with virtual rates being determined at start of each frame and policy of each frame being determined by solving a decision process for determining an optimal network utilization maximization, the system includes computer processing for carrying out the following: i. initializing an index indicative of an interval, a system state and control parameters; and ii. if the interval is not the start of a frame then: 1) determining action for current interval using system state for current interval and the state action frequencies computed for the current frame; 2) performing action for current interval; 3) updating virtual queues and incrementing the interval index; and 4) updating system state for the current interval.
 11. The system of claim 10, wherein iii. if the interval is the start of a frame then: 5) sampling of virtual queues; 6) determining virtual arrival rates using sampled virtual queue rates; 7) using sampled virtual queue to determine state-action frequencies by a linear process.
 12. The system of claim 10, wherein the system state at any interval consists of coarse channel estimates reported by all users for that interval, the system state includes for each possible pair of users, the fine and coarse channel estimates from both users in that pair for the recent-most prior interval on which that user pair was scheduled and the index of that interval.
 13. The system of claim 10, wherein step ii—2), performing action for current interval, comprises the action being performed over three orthogonal slots in an interval, with a first slot of new packets being transmitted to a selected user pair whereas in slots 2 and 3 interference resolution is performed for the selected pair by sending interference resolving packets for a previous pending transmission involving the selected pair.
 14. The system of claim 13, wherein the action being performed over three orthogonal slots comprises, the overall transmit precoding matrix being denoted by the matrix [W_(u) ₁ [k], W_(u) ₂ [k]] where W_(u) ₁ [k], W_(u) ₂ [k]εC^(M) ^(t) ^(×2), letting x_(u) ₁ [k]=W_(u) ₁ [k]s_(u) ₁ [k], x_(u) ₂ [k]=W_(u) ₂ [k]s_(u) ₂ [k], where s_(u) ₁ [k], s_(u) ₂ [k] denote 2×1 symbol vectors containing symbols formed using new packets intended for user u_(l) and u₂, respectively, the signal transmitted in the first slot being x_(u) ₁ [k]+x_(u) ₂ [k] and the received signals at both users are y _(u) ₁ [k,1]=h _(u) ₁ [k](x _(u) ₁ [k]+x _(u) ₂ [k])+n _(u) ₁ [k,1],  (2) y _(u) ₂ [k,1]=h _(u) ₂ [k](x _(u) ₁ [k]+x _(u) ₂ [k])+n _(u) ₂ [k,1].  (3) with allocated transmission power for scheduled user u_(i) is the norm ∥W_(u) ₂ [k]∥².
 15. The system of claim 10, wherein step ii—3), updating of the virtual queues, comprises using Q _(n) [k+1]=(Q _(n) [k]−R _(n) ^(Ψ*) ^(Q[τT]) [k])⁺ +r* _(n)[τ], where the virtual queue maintained for each user is denoted as Q_(n)[k], k=0, 1, . . . & n=1, . . . , N, virtual arrival rate for a user n is set at r*_(n)[τ] in each interval in the τ^(th) frame, and R_(n) ^(ψ) ^(Q[τT]) [k] denotes the service rate of user n in each time interval k and τ^(th) frame.
 16. The method of claim 10, wherein step ii—4) updating system state for current interval, comprises updating coarse channel state information CSI reported by all users for the current interval and a tuple corresponding to the user pair scheduled in the previous interval is updated using the fine and coarse CSI received from both users in that pair for the previous interval and respective fine channel estimates are fed forward to both the users in that pair.
 17. The method of claim 12, wherein step iii—6, determining virtual arrival rates, comprises using sampled virtual queues in ${{\max\limits_{r:{0r{r_{\max}1}}}{V \cdot {U(r)}}} - {\sum\limits_{n = 1}^{N}{{Q_{n}\left\lbrack {\tau \; T} \right\rbrack}r_{n}}}},$ where the virtual queue maintained for each user is denoted as Q_(n)[k], k=0, 1, . . . & n=1, . . . , N, virtual arrival rate for a user n is set at r*_(n)[τ] in each interval in the τ^(th) frame, V is a positive constant, and U (r) is a utility value.
 18. The method of claim 12, wherein, step iii—7), determining action state frequencies comprises solving linear program equation (17) $\max\limits_{x}{\sum\limits_{\underset{\_}{s},\underset{\_}{a}}{q^{T}{R\left( {\underset{\_}{s},\underset{\_}{a}} \right)}{x\left( {\underset{\_}{s},\underset{\_}{a}} \right)}}}$ ${s.t.\mspace{14mu} x} \in {\underset{\_}{X}.}$ where R_(n) (s, a) denotes the achieved transmission rate for user n when action a is taken and the system state is s, state action frequencies are denoted by {x(s, a)}sεS,aεA, where each x(s, a)lies in the unit interval [0, 1] and represents the frequency that the system state is at s and action a is taken. The state action frequencies need to satisfy the normalization equation, q is a virtual queue length, X is a formed state action prototype and x denoted any vector of state action frequencies lying in X. 